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t  Abstract 

This  report  is  tie  final  report  of  our  research  project  on  the  codification  of 
concurrent  programming  knowledge.  This  project  covers  the  period  from  Ju'y 
1,  1?  'S  to  September  31  j  1981. 

I  The  general  goal  of  research  in  this  area  is  to  codify  programming  knowledge 

end  10  create  programming  systems  that,  employ  this  knowledge  to  ass:si  ii: 
various  programming  activities  including  specification,  synthesis,  modific  tion, 
debugging,  and  maintenance.  This  project  is  concerned  with  the  codification  of 
programming  knowledge  for  concurrent  programming.  Our  aim  is  to  produce 

♦  knov.ledge-b&sed  design  tools  to  help  with  problems  in  this  area. 

This  paper  primarily  raises  some  questions  that  must  be  addressed  in  a  study  of 
?  more  focused  area,  namely  that  of  generation  of  concurrent  microcode. 

We  first  introduce  a  basic  parallelism  operator.  The  intent  is  to  refine  parallel 
programs  specified  using  this  operator  into  microcode.  We  discuss  briefly  how 
the  hardware  architecture  affects  the  level  of  parallelism  exploited  in  the  micro¬ 
code.  Then  we  discuss  issues  in  the  automatic  generation  of  compact  yet  fast 
microcode.  Some  advantages  of  microcode  programming  by  refinement  of  higb- 
hvel  specifications  ar*.  brought  up,  namely  exploiting  high-level  parallelism,  and 
assurance  of  correctness  of  the  resulting  code.  The  refinement  paradigm  requires 
intermediate  level  constructs  and  search  for  efficient  implementations,  whic  h  are 
discussed  An  example  is  devised  to  see  if  macroparallelism  in  the  high-level 
specification  is  carried  over  in  the  microcode.  ^ 
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1.  Introduction 


El  Introduction 

This  research  is  concerned  with  the  building  of  hardware,  firmware,  and  software 
tools  for  the  design  of  efficient,  reliable,  and  maintainable  computing  systems. 
These  tools  can  use  knowledge  about  the  programming  process  to  assist  the 
system  builder  with  many  tasks,  including  design,  specification,  coding,  debug¬ 
ging,  modification,  documentation,  maintenance,  and  reliability  assurance,  all 
in  a  cost-efficient  manner  [Parnas]  [Phillips].  Our  current  research  deals  mainly 
with  Knowledge- based  concurrent  programming  and  the  process  of  transforming 
high-level  specifications  into  microcode. 

In  [Green],  we  presented  some  very- high- level  constructs  with  which  we  could 
straightforwardly  specify  problems  that  had  different  degrees  of  concurrency. 
We  also  presented  a  calculus  which  allowed  us  to  transform  these  specifications 
into  different,  but  equivalent,  programs.  The  calculus  consisted  of  a  set  of 
transformation  rules  which  enabled  us  to  go  from  the  specification  to  different 
lower-level  target  expressions. 

In  this  report,  we  consider  the  use  of  the  successive  refinement  paradigm  to  trans¬ 
form  very-high-level  program  specifications  into  the  microcode  target  level,  thus 
integrating  the  software  and  firmware  development  process.  This  integration 
requires  the  extension  of  the  body  of  transformation  rules  from  the  domain  of 
very- high- level  expressions  down  to  that  of  microcode.  The  use  of  the  knowledge- 
based  transformational  approach  should  diminish  the  conflict  between  efficiency 
and  ease  of  specification.  It  may  also  reduce  some  of  the  exponential  search 
problems  posed  by  microcode  optimization  [Dasgupta79],  through  the  use  of  in¬ 
formation  carried  down  during  transformation  from  higher  abstraction  levels. 
For  example,  the  high-level  specification  indicates  which  pieces  of  microcode  are 
completely  independent  and  can  be  run  in  parallel,  thus  avoiding  search  that  only 
re-discovers  possible  parallelism.  Also,  a  knowledge-based  system  using  stepwise 
refinement  of  high-level  specifications  allows  efficiency  estimation  of  alternative 
implementations,  avoiding  blind  search  through  the  space  of  the  possible  cluster¬ 
ings  of  micro-operations. 


1.1  Summary  of  Previous  Work 

Most  of  our  work  has  been  covered  in  our  previous  report  [Green].  Below  we 
provide  a  brief  summary  of  that  report. 

The  report  covered  the  extension  of  our  work  in  sequential  programming  into  the 
area  of  concurrent  programming.  First,  we  extended  our  language  for  express- 
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ing  program  synthesis  rules  to  allow  the  expression  of  concurrent  programming 
knowledge.  Given  that  extension,  we  were  able  to  express  elementary  transfor¬ 
mation  rules  that  map  a  high-level  program  (which  is  non-committal  with  respect 
to  sequentiality  or  parallelism)  into  various  concurrent  versions.  We  gave  an  ex¬ 
ample  in  which  we  synthesized  several  versions  of  a  simple  retrieval  program, 
with  the  versions  having  different  degrees  of  concurrency. 

An  example  we  studied  in  detail  is  a  concurrent  odd-even  transpose  sort,  which 
is  a  type  of  sorting  network.  Transpose  sort  did  not  appear  to  fit  directly  into 
our  stepwise  refinement  paradigm  for  program  construction.  We  proved  the 
correctness  of  the  algorithm,  and  presented  a  derivation  which  suggests  the  need 
for  extending  the  stepwise  refinement  paradigm. 

We  briefly  examined  more  complex  algorithms  including  concurrent  shortest 
path  and  prime-finding  algorithms.  These  derivations  appear  to  be  as  tractable 
as  their  sequential  versions  of  comparable  complexity. 

The  report  included  background  material  on  our  general  approach  to  program 
synthesis.  A  tutorial  explained  how  knowledge  can  be  codified  as  a  collection  of 
rules  which  may  be  used  to  transform  a  specification  into  a  program.  Also  dis¬ 
cussed  was  the  subject  of  which  allowable  reorderings  of  computations  would  not 
violate  constraints  imposed  by  the  hardware  or  by  the  high-level  specifications. 


§2  Parallel  Constructs 

In  [Green],  we  proposed  some  basic  constructs  which  allowed  us  to  specify  highly 
parallel  problems  [Fuller].  The  basic  parallelism  operator  is  //,  which  is  applied 
to  a  function  and  a  collection  of  arguments.  / /  applies  the  function  to  each  of 
the  elements  of  the  collection,  but  in  any  order,  or  in  parallel. 

The  result  returned  by  the  //  statement  consists  of  a  collection  of  the  results  of 
each  application.  If  the  given  collection  of  arguments  is  a  set,  the  results  will 
be  encapsulated  in  a  set.  If  they  are  given  as  a  list,  the  order  of  the  results  will 
be  preserved  as  in  the  given  list  of  arguments.  We  delimit  collections  with  angle 
brackets  when  we  talk  about  abstract  collections,  but  we  use  curly  brackets  or 
parentheses  for  sets  or  lists  respectively. 

//  can  be  terminated  in  three  ways:  (1)  return  a  value  as  in  standard  LISP;  (2) 
return  EMPTY  (distinct  from  NIL);  and  (3)  abort  all  other  parallel  computations 
generated  by  this  //  and  return  a  single  value.  For  example, 

(//  op  (argi  arg2  •  •  •  argn)) 
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would  cause  the  parallel  application  of  op  to  each  a r  and  the  result  would  be 

{{op  argi){op  arg2)--). 

If  any  of  the  applications  of  op  returns  EMPTY,  that  value  will  simply  not 
appear  in  the  result  list.  This  enables  us  to  represent  filters.  We  also  use  a 
feature  which  allows  us  to  stop  every  sibling  process  when  one  of  these  processes 
says  ( Alldone  {Value)).  This  feature  is  particularly  useful  for  parallel  searches, 
because  it  allows  us  to  stop  all  the  parallel  processes  which  are  seeking  some  goal 
as  soon  as  one  of  them  attains  it. 

The  way  to  start  multiple  processes  is: 

(//  Identity  {proe i  •  •  •  procn)). 

Processes  are  independent  computations,  and  no  extra  operator  is  applied  once 
the  arguments  are  evaluated.  That  is,  once  the  processes  have  run  (each  one 
deciding  by  itself  if  it  returns  any  value  or  not),  the  collection  of  their  results  be¬ 
comes  the  result  of  the  //  statement.  To  simplify  the  notation,  when  the  operator 
is  just  Identity,  it  need  not  be  written.  Hence  (//  {collection  of  processes))  will 
start  a  group  of  parallel  processes  and  terminate  when  one  of  them  flags  Alldone, 
or  when  all  the  processes  are  done  normally. 

On  the  other  hand,  we  may  want  to  apply  the  same  function  as  a  filter  to  all 
the  elements  of  a  set.  This  can  be  specified  as: 

(//  filter  ( collection  of  elements)) 

This  provides  a  multi  fork-join  facility.  There  is  no  need  for  more  concurrency 
control  for  the  so  called  “internal  concurrency  problems.”  Note  that  the  join  is 
well  defined  because  all  the  arguments  have  to  be  evaluated  so  that  the  operator 
may  be  applied. 


§3  Implementation  with  Microcode 

Our  calculus  must  be  extended  with  new  rules  to  transform  programs  from  / / 
constructs  down  to  the  microcode  level.  STRUM  [Patterson]  and  S*  [Dasgupta78] 
provide  useful  ideas  high-level  microcode  constructs. 

The  hardware  architecture  could  facilitate  the  development  of  refinement  deriva¬ 
tions  by  providing  features  which  resemble  the  higher  level  constructs.  For  ex¬ 
ample,  [Arden]  presents  the  MP/C  approach,  a  concurrent  computing  environ¬ 
ment  having  the  shared  memory  aspects  of  tightly-coupled  multiprocessors  and 
also  the  characteristics  usually  associated  with  loosely-coupled  message-oriented 
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systems.  A  large  address  space  is  dynamically  partitioned  in  a  hierarchical  way 
through  the  use  of  dedicated  switches  which  control  the  common  bus  through 
which  the  processors  can  access  segments  of  a  linear  memory.  These  switches  are 
also  connected  with  a  command  bus  which  enables  them  to  implement  powerful 
operations  like  the  ones  needed  for  the  //  construct  in  a  direct  way.  Hence,  better 
use  of  the  processors  could  be  made  through  the  non-commital  very-high-level 
specification  of  the  programs. 


Fig.l:  Transformation  and  interpretation 


3.1  Space  and  Time  Efficiency  Considerations 

Since  microcode  affects  every  operation,  it  cannot  be  inefficient.  Microcode  can 
be  viewed  as  being  the  innermost  loop  of  every  computation,  so  that  if  the 
microcode  is  inefficient,  every  computation  is  inefficient.  Therefore,  the  micro¬ 
code  synthesized  should  approach  hand-coded  efficiency. 

A  compaction  problem  arises  when  the  high-level  microcode  is  transformed  into 
elementary  control  words.  The  micro-compilers  for  general  purpose  machines 
first  generate  sequences  of  micro-operations  which  have  some  constraints  (in¬ 
troduced  as  a  partial  ordering  in  time  and  limited  resources  on  which  to  run 
these  micro-operations).  Then,  these  operations  are  bundled  in  control  words 


3.  Implementation  with  Microcode 


6 


in  the  most  compact  way  which  satisfies  the  aforementioned  constraints.  This 
compaction  problem  has  been  solved  only  with  minor  success  in  the  past:  when 
the  microcode  is  mostly  vertical,  microcompilation  as  well  as  verification  is 
not  difficult.  However,  when  the  control  words  are  horizontal,  and  there  is  an 
inherently  high  degree  of  microparallelism,  the  problem  grows  rapidly  with  the 
size  of  the  input. 

The  advantage  of  handcoded  microcode  is  that  it  can  be  very  fast,  but  it  is  very 
difficult  to  ensure  its  correctness.  The  advantage  of  automatically  generated 
microcode  is  that  it  can  be  proven  correct  with  respect  to  higher-level  specifica¬ 
tions.  Its  disadvantage  is  that  until  now  no  one  has  generated  automatically 
acceptably  efficient  microcode  for  horizontal  microarchitectures. 

One  possibly  bad  approach  is  to  generate  microcode  automatically  and  then 
try  to  optimize  it.  There  are  two  key  aspects  to  be  optimized:  time  and 
space.  Clearly,  these  are  interdependent.  Reducing  space  may  be  achieved 
by  reducing  either  the  size  or  the  number  of  control  words.  Increasing  speed 
requires  a  high  degree  of  microparallelism  and  short  decoding  delays  for  the 
control  words.  These  last  two  considerations  are  in  conflict  with  the  reduction 
of  word  size  because  a  tighter  encoding  organization  allows  fewer  possibilities  for 
microparallelism.  Also,  the  requirements  of  more  complex  decoding  introduce 
an  additional  delay  in  the  execution  of  each  microprogram. 

There  is  a  disadvantage  in  using  a  maximal  encoding  schema,  outside  of  speed 
considerations.  Namely,  adding  a  new  instruction  requires  a  new  state  and 
new  encoding  and  decoding  logic,  all  of  which  implies  hardware  change.  In  the 
maximal  parallelism  alternative,  as  many  micro-operations  can  be  run  simulta¬ 
neously  as  the  hardware  allows  (that  is,  as  many  additions  can  be  run  at  once 
as  there  are  adders  and  appropriate  data  paths),  and  there  is  no  need  for  any 
decoding  at  all,  since  each  bit  of  the  control  word  directly  controls  the  hardware 
involved  in  a  microoperation.  Because  of  this,  it  is  frequently  called  direct  control. 

An  intermediate  approach  can  avoid  the  lack  of  flexibility  and  parallelism  of  the 
first  alternative  and  the  excessive  word  length  of  the  second.  In  this  minimally 
encoded  approach,  a  group  of  microoperations  which  or e  never  executed  together 
(that  is,  they  are  mutually  exclusive  during  a  microcycle)  are  encoded  together. 
Hence,  there  is  a  partition  of  the  totality  of  microoperations  into  k  sets  of  m, 
microoperations.  This  partitioning  allows  for  an  encoding  in  a  total  control  word 
length  | tu |  =  £i.iflo|a»ml.  This  intermediate  approach  is  the  one  in  which 
the  optimization  difficulties  appear.  The  problem  consists  of  minimising  the 
total  number  of  bits  |tf|  satisfying  a  given  set  of  disjoint  operations.  Although 
this  requires  a  separate  decoder  for  each  field,  each  decoder  is  very  simple  since 
the  fields  are  small. 
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The  problem  of  minimizing  the  length  of  the  microcode  is  closer  to  the  traditional 
code  optimization  problems  found  in  compilers,  except  for  some  timing  restric¬ 
tions  and  hardware  dependencies  that  are  intrinsic  to  microcode  optimization. 
This  similarity  comes  from  the  fact  that  in  both  cases  there  is  a  set  of  limited 
resources  to  assign  to  a  computation,  and  a  space  of  admissible  solutions,  with 
time/space  tradeoffs.  Of  course,  there  are  differences  which  make  the  techniques 
of  software  optimization  not  directly  applicable  to  this  kind  of  optimization. 

Since  microcode  generated  from  higher-level  specifications  must  be  optimized,  a 
transformational  approach  that  modifies  a  specification  along  a  refinement  path 
can  be  very  advantageous.  By  its  mere  existence,  this  path  indicates  the  validity 
and  satisfaction  of  all  the  constraints. 

We  should  note  that  there  are  two  efficiency  considerations:  the  efficiency  of 
the  search  in  the  space  of  implementations,  and  the  efficiency  of  the  resulting 
code.  These  are  directly  related  because  the  higher  the  search  efficiency,  the 
more  alternatives  can  be  explored  for  the  target  code. 


3.2  Correctness 

The  second  reason  to  prefer  the  transformational  approach  is  that  of  ensuring 
the  correctness  of  the  synthesized  code.  In  [Patterson],  it  is  reported  how  several 
errors  were  uncovered  through  the  systematic  verification  of  microcode  generated 
from  a  higher- level  specification  language.  It  should  be  noted  that,  in  our 
system,  verification  rules  can  again  use  the  same  rule-manipulation  machinery, 
but  most  of  them  do  not  even  need  to  appear  in  the  system  in  any  explicit 
way,  since  the  transformation  rules  already  fulfill  that  purpose  by  transforming 
correct  specifications  into  correct  microcode.  Of  course,  there  are  still  two 
requirements  to  ensure  the  correctness  of  the  resulting  microcode.  The  first  is  to 
show  that  the  transformation  rules  preserve  correctness.  That  is,  the  soundness 
and  consistency  of  the  synthesis  system  must  be  proved,  but  that  need  be  done 
only  once  for  all  the  programs  to  be  generated.  The  second  problem  is  to  improve 
the  chances  of  correctness  of  the  specifications.  We  can  significantly  modify  the 
chances  for  success  by  providing  notations  which  will  facilitate  the  specification 
of  problems  in  a  way  consistent  with  our  ideas  of  what  we  want  to  specify.  It 
should  be  noted  that  this  key  issue  to  the  true  total  correctness  of  programs  is 
frequently  neglected  in  traditional  program  correctness  proving.  We  think  that 
the  clarity  and  simplicity  which  is  achieved  through  very-high-level  languages 
using  such  constructs  as  the  ones  we  use  is  one  step  in  the  right  direction  for 
this  problem. 


4.  The  Space  of  Refinements 
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For  completeness,  it  should  be  noted  that  the  equivalence  between  the  original 
specifications  and  the  target  code  does  not  jet  include  the  termination  properties 
of  the  programs.  That  is,  although  the  specification  of  a  program  could  be 
correct,  and  have  a  terminating  solution,  once  some  sequences  of  transformations 
are  applied,  the  resulting  code  could  no  longer  be  guaranteed  to  terminate.  This 
is  a  limitation  of  the  calculus  we  presented,  but,  as  was  suggested  in  [Green], 
that  issue  could  be  taken  care  of  by  a  heuristic  technique  which  would  handle 
the  termination  problem.  While  pruning  the  less  efficient  branches,  the  efficiency 
expert  might  also  be  able  to  eliminate  most  non-terminating  ones. 


§4  The  Space  of  Refinements 

One  reason  knowledge-based  synthesis  is  appropriate  is  that  a  key  aspect  of  it 
involves  the  generation  of  alternative  implementations.  Since  the  generation/use 
ratio  is  even  lower  for  firmware  than  for  systems  software,  it  is  viable  to  spend 
more  time  and  resources  in  the  generation  and  search  of  alternative  implemen¬ 
tations.  This  should  not  give  false  hopes:  exponentially  growing  problems,  even 
those  that  appear  small,  may  be  for  all  practical  purposes  unsolvable.  Microcode 
optimization  has  been  shown  to  be  NP  hard,  which  is  one  more  reason  to  prefer 
a  knowledge-based  heuristic  approach. 

In  a  rule-based  system,  instead  of  having  to  create  a  complex  algorithm  which 
will  generate  valid  microcode,  the  constraints  are  incorporated  as  rules  in  a 
homogeneous  system  of  specification  transformations.  In  this  way,  optimization 
can  occur  in  small  intermediate  steps.  Every  branch  need  not  be  improved,  since 
many  of  them  will  be  discarded  anyway.  This  avoids  leaving  all  the  optimization 
for  the  end,  when  all  the  higher-level  information  will  have  been  lost.  Most  of  the 
work  done  in  micro-compilers  up  to  now  had  either  used  specific  algorithms  or 
blind  searches,  but  none  used  a  knowledge-based  search  for  the  transformation. 

There  are  two  clearly  distinct  domains  in  which  parallelism  takes  a  central  role 
in  microprograms.  One  could  be  called  “micro-parallelism”,  which  refers  to 
the  parallelism  which  appears  in  the  control  of  the  lowest  level  of  the  abstract 
processor.  This  means  that  several  of  the  sub-parts  of  a  machine  instruction 
are  done  in  a  concurrent  fashion  by  the  micro- architecture.  This  is  a  very  low- 
level  granularity  kind  of  concurrency,  and  has  little  similarity  with  software 
concurrency;  it  is  similar  to  the  parallelism  of  the  components  of  an  abstract 
electronic  device. 

Some  of  the  key  issues  in  concurrent  systems  involve  the  development  of  a  well- 
defined  hierarchical  structure  in  which  lower  levels  implement  the  abstractions 
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used  at  higher  levels.  In  this  structure,  the  firmware  can  provide  very  efficient 
implementations  for  approaches  whose  practicality  is  marginal  if  implemented  in 
software,  but  which  become  theoretically  and  practically  sound  if  the  firmware 
provides  an  adequate  support.  This  brings  us  to  several  important  questions 
which  relate  to  architecture  and  concurrent  software  development.  Mainly,  what 
should  be  put  in  hardware,  what  in  firmware,  and  what  in  software?  This 
presents  still  another  question  which  is  how  to  describe  and  formalize  what  to 
implement  at  the  different  levels? 

A  system  can  be  built  by  describing  a  sequence  of  refinements,  in  which  each 
level  supports  the  abstractions  defined  one  level  above  by  providing  an  adequate 
set  of  primitives  which  are  used  to  define  these  abstractions.  It  is  necessary 
to  introduce  islands  in  the  language  in  which  this  refinement  chain  is  expressed. 
The  need  for  these  islands  arises  from  the  significantly  different  kind  of  problems 
solved  at  each  level. 

Analogously,  different  constructs  are  used  at  each  level  to  describe  algorithms 
throughout  their  refinement  from  software  to  hardware.  At  each  level  of  refine¬ 
ment,  the  formalism  should  be  clear  in  its  meaning.  That  is,  it  is  not  enough  to 
have  just  a  clear  high-level  description  which  is  transformed  into  unintelligible 
code  at  lower  levels.  That  there  are  different  constructs  in  which  the  domain 
problems  have  to  be  posed  at  each  level  does  not  preclude  a  uniform  approach 
for  the  refinement  methodology. 

Since  there  are  various  domains  of  objects  which  are  populated  by  different  kind 
of  entities  (e.g.  registers,  data  paths,  gates,  control  words,  activation  records, 
sets,  etc.),  descriptions  are  needed  of  the  constraints  on  the  interactions  between 
these  components  to  generate  the  specified  behavior  of  the  system,  and  also  the 
constraints  which  define  the  valid  (or  even  possible)  modes  of  behavior.  For 
example,  at  the  software  level,  the  set  of  entry  points  to  a  class  gives  the  only 
possible  ways  to  interact  with  it.  At  the  firmware  level,  there  are  constraints  on 
the  number  of  possible  parallel  actions  that  can  be  initiated  in  a  cycle  because  of 
the  chosen  grouping  and  decoding  scheme.  At  the  hardware  level,  these  entities 
might  be  registers,  data  paths,  etc. 

This  description  system  needs  to  be  rich  enough  to  describe  any  part  for  which 
the  synthesis  system  is  to  generate  options  for  implementation.  On  the  other 
hand,  it  must  be  simple  enough  so  that  these  descriptions  can  be  manipulated 
easily  (either  manually  or  automatically).  Earlier  efforts  which  developed  high- 
level  microcode  languages  with  microcompilation  and/or  verification  in  mind  can 
suit  our  synthesis  needs. 

The  verification  of  programs  a  posteriori  is  a  much  more  difficult  task  than  creat- 
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ing  them  correct  (through  correctness  preserving  transformations),  so  particular 
advantage  can  be  taken  of  the  decoupling  between  many  of  the  functions  which 
result  from  software  migration  into  hardware.  Most  of  the  programs  which  have 
been  automatically  synthesized  to  date  are  relatively  short,  but  this  is  precisely 
the  case  with  microprograms:  they  tend  to  be  short  and  have  little  interaction 
with  other  modules.  There  is  another  domain  where  microprogramming  has  a 
bearing  on  concurrency,  and  which  corresponds  to  large  granularity  parallelism: 
communicating  microprocesses,  running  on  a  net  of  microprocessors.  Here,  the 
degree  of  variability  of  the  micro-substrates  is  even  wider  than  for  micro-archi¬ 
tectures,  and  for  the  most  part  there  is  no  developed  formalism  to  deal  with  this 
at  the  substrate  description  level.  On  the  other  hand,  there  are  valid  analogies 
with  the  tools  developed  for  dealing  with  concurrency  at  the  software  level,  and 
which  are  being  used  with  increasing  success  in  the  operating  systems  area. 

Although  we  should  use  the  simplest  and  most  abstract  notation  to  achieve  a 
good  degree  of  generality,  there  are  some  limitations  which  are  again  intrinsic  to 
microprogramming.  Namely,  it  makes  sense  to  talk  about  a  software  language 
which  runs  on  an  abstract  machine  which  implements  the  abstractions  the  lan¬ 
guage  uses  (via  a  compiler  or  an  interpreter  running  on  some  extended  machine 
[Dasgupta78]),  but  this  cannot  be  used  at  the  microprogramming  level,  since 
levels  of  abstraction  cannot  be  interposed  unless  they  are  implemented  in  the 
hardware  micro- architecture.  Hence,  we  should  extend  the  work  which  develops 
tools  to  describe  the  full  scope  of  variability  of  this  domain,  by  writing  high-level 
microprograms  which  exploit  the  full  possibilities  of  concurrency  given  by  the 
micro- architecture. 


§5  Mapping  HI  Constructs  into  Microcode:  an  Example 

We  experimented  with  transforming  manually  a  high-level  expression  into  micro¬ 
code.  We  assumed  a  simple  target  architecture  which  was  devoid  of  high-level 
parallelism,  but  which  allowed  several  microoperations  to  take  place  in  the  same 
cycle.  There  were  several  registers,  which  allowed  overlapped  fetch  and  store,  but 
only  one  ALU  which  limited  the  available  parallelism  for  arithmetic  operations. 
The  intention  was  to  find  if  a  system  could  take  advantage  of  the  microparallelism 
when  the  specification  of  the  problem  allowed  high  level  concurrency.  The 
specification  was 

4>{$truct)  =  {mt'n(z)  |  x  6  struct} 

which  means  to  find  the  set  composed  of  the  minimums  of  each  of  the  subsets 
which  are  in  struct.  For  example:  ^({{1  3  6}{87  12}))  =  {1  12}.  This  is  non- 
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committal  as  to  a  parallel  or  serial  implementation.  Noticing  that  the  operation 
min  in  each  of  the  sets  does  not  affect  the  value  of  the  elements  of  the  set,  and 
that  to  calculate  min  of  a  set  only  the  values  of  the  elements  of  the  set  are 
needed,  implies  that  the  min  of  for  all  the  subsets  can  be  computed  in  parallel. 
Hence,  using  our  //  notation,  the  specification  may  be  transformed  into: 

(//  mt'n  struct) 

which  could  be  computed  in  INTERLISP  with: 

(for  s  in  struct  collect  (min  s)). 

Of  course,  in  neither  of  the  last  two  transformations  is  there  any  indication  of  the 
order  in  which  these  min  operations  are  to  be  performed.  Next,  we  transformed 
this  into  a  serial  algorithm,  then  into  a  pidgin  high-level  microcode  language, 
with  the  expectation  of  being  able  to  take  advantage  of  all  the  microparallelism 
features  of  the  assumed  architecture.  At  this  point,  the  higher-level  parallelism 
could  not  be  extracted  (or  detected)  from  looking  at  the  serial  algorithm  micro¬ 
code.  This  seems  to  support  our  belief  that  the  parallelism  of  an  algorithm  can 
be  better  exploited  at  a  high  level.  That  is,  there  is  no  clear  way  in  which  we 
could  see  the  macroparallelism  through  microparallelism.  It  should  be  noted 
that  in  spite  of  this,  many  special-purpose  machines  may  have  advantages  at  the 
microparallelism  level,  in  particular  for  very  regular  numerical  algorithms. 


§6  Conclusions 

We  have  brought  out  some  issues  concerning  the  use  of  the  paradigm  of  step¬ 
wise  refinement  to  transform  concurrent  program  specifications  using  very-high- 
level  constructs  into  microcode.  We  considered  optimization  in  the  light  of  this 
paradigm,  and  how  it  could  give  us  an  edge  over  a  strictly  algorithmic  approach. 
Work  is  needed  to  select  appropriate  intermediate-level  constructs,  and  to  de¬ 
velop  further  the  base  of  refinement  rules  that  code  high-level  parallelism. 
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